[#441] [IMPROVE] Preload embedding model at startup #461
[#441] [IMPROVE] Preload embedding model at startup #461sahilds1 wants to merge 17 commits intoCodeForPhilly:developfrom
Conversation
… routed to the application instance
There was a problem hiding this comment.
Pull request overview
This PR aims to ensure the SentenceTransformer embedding model is loaded during Django startup (before traffic hits the instance) and to make the embeddings search logic more testable by factoring it into smaller functions.
Changes:
- Refactors
get_closest_embeddingsby extracting query building, evaluation, and usage logging into helper functions. - Adds pytest + pytest-django support (requirements +
pytest.ini) and new unit tests for embedding service helpers. - Updates GitHub Actions workflow and README to run backend tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
server/requirements.txt |
Adds pytest dependencies to support running backend tests. |
server/pytest.ini |
Configures pytest-django settings/module and python path for the server package. |
server/api/services/test_embedding_services.py |
Adds unit tests for query evaluation and usage logging helpers. |
server/api/services/embedding_services.py |
Refactors embeddings search into build_query, evaluate_query, log_usage, and reworks get_closest_embeddings. |
server/api/apps.py |
Attempts to preload the embedding model during Django app initialization via ready(). |
README.md |
Documents how to run backend tests inside the backend container. |
.github/workflows/python-app.yml |
Changes CI branch targets and adds dependency install + pytest execution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @@ -32,59 +31,52 @@ def get_closest_embeddings( | |||
|
|
|||
| Returns | |||
| ------- | |||
| list[dict] | |||
| List of dictionaries containing embedding results with keys: | |||
| - name: document name | |||
| - text: embedded text content | |||
| - page_number: page number in source document | |||
| - chunk_number: chunk number within the document | |||
| - distance: L2 distance from query embedding | |||
| - file_id: GUID of the source file | |||
| QuerySet | |||
| Unevaluated Django QuerySet ordered by L2 distance, sliced to num_results | |||
| """ | |||
|
|
|||
| encoding_start = time.time() | |||
| transformerModel = TransformerModel.get_instance().model | |||
| embedding_message = transformerModel.encode(message_data) | |||
| encoding_time = time.time() - encoding_start | |||
|
|
|||
| db_query_start = time.time() | |||
|
|
|||
| # Django QuerySets are lazily evaluated | |||
| if user.is_authenticated: | |||
| # User sees their own files + files uploaded by superusers | |||
| closest_embeddings_query = ( | |||
| Embeddings.objects.filter( | |||
| Q(upload_file__uploaded_by=user) | Q(upload_file__uploaded_by__is_superuser=True) | |||
| ) | |||
| .annotate( | |||
| distance=L2Distance("embedding_sentence_transformers", embedding_message) | |||
| ) | |||
| .order_by("distance") | |||
| queryset = Embeddings.objects.filter( | |||
| Q(upload_file__uploaded_by=user) | Q(upload_file__uploaded_by__is_superuser=True) | |||
| ) | |||
| else: | |||
| # Unauthenticated users only see superuser-uploaded files | |||
| closest_embeddings_query = ( | |||
| Embeddings.objects.filter(upload_file__uploaded_by__is_superuser=True) | |||
| .annotate( | |||
| distance=L2Distance("embedding_sentence_transformers", embedding_message) | |||
| ) | |||
| .order_by("distance") | |||
| ) | |||
| queryset = Embeddings.objects.filter(upload_file__uploaded_by__is_superuser=True) | |||
|
|
|||
| queryset = ( | |||
| queryset | |||
| .annotate(distance=L2Distance("embedding_sentence_transformers", embedding_vector)) | |||
| .order_by("distance") | |||
| ) | |||
|
|
|||
| # Filtering to a document GUID takes precedence over a document name | |||
| if guid: | |||
| closest_embeddings_query = closest_embeddings_query.filter( | |||
| upload_file__guid=guid | |||
| ) | |||
| queryset = queryset.filter(upload_file__guid=guid) | |||
| elif document_name: | |||
| closest_embeddings_query = closest_embeddings_query.filter(name=document_name) | |||
| queryset = queryset.filter(name=document_name) | |||
|
|
|||
| # Slicing is equivalent to SQL's LIMIT clause | |||
| closest_embeddings_query = closest_embeddings_query[:num_results] | |||
| return queryset[:num_results] | |||
There was a problem hiding this comment.
build_query() introduces/relocates important filtering + precedence logic (authenticated vs unauthenticated visibility; guid-over-document_name; LIMIT slicing), but the new tests only cover evaluate_query and log_usage. Add unit/integration tests covering build_query behavior (e.g., guid precedence and the authenticated/unauthenticated queryset filters) to prevent regressions in access control and filtering.
There was a problem hiding this comment.
Building on Copilot's comment, the specifics of the QuerySet object's structure aren't publicly documented. To inspect the QuerySets, we should actually execute them.
There's a couple ways we handle DB access for these tests. We could use [pytest-django's ``@pytest.mark.django_db](https://pytest-django.readthedocs.io/en/latest/database.html), which wraps the test in a transaction the rolls back automatically afterwards. Django also has a built-in django.test.TestCase`, which does a similar thing.
There was a problem hiding this comment.
Thanks for sharing the docs references -- I added tests for build_query and didn't have to access the database because I was able to inspect which methods and arguments were called on the model ("Embeddings")
| def ready(self): | ||
| from .services.sentencetTransformer_model import TransformerModel | ||
| TransformerModel.get_instance() |
There was a problem hiding this comment.
ApiConfig.ready() will only run if this AppConfig is actually used by Django. Right now INSTALLED_APPS appears to include just "api" (not "api.apps.ApiConfig"), and api/__init__.py doesn’t set a default config, so this preload hook may never execute. Consider updating INSTALLED_APPS to reference api.apps.ApiConfig (or otherwise ensuring this config is selected) so the model is preloaded as intended.
There was a problem hiding this comment.
The model preloads as intended because Django ≥ 3.2 auto discovers AppConfig subclasses in apps.py
| def ready(self): | ||
| from .services.sentencetTransformer_model import TransformerModel | ||
| TransformerModel.get_instance() |
There was a problem hiding this comment.
Calling TransformerModel.get_instance() unconditionally in ready() will run for every Django startup context (tests, migrations, management commands, autoreload) and can trigger a large model download/init even when no web traffic will be served. Consider gating this preload behind an explicit env flag (or limiting it to the web server entrypoint) to avoid slowing/fragilizing CI and one-off management commands.
There was a problem hiding this comment.
I added a guard to only preload the model when we're actually going to serve requests
There was a problem hiding this comment.
This is good, the refactor of get_closest_embeddings() was definitely needed. For build_query we need some tests. It's a little more complicated than simple unit testing but I've left some docs references in my comments.
I'm not sure how CI will work right now, with preloading the model in ready. Also, once we have test coverage for build_query that may change CI requirements.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # --------------------------------------------------------------------------- | ||
|
|
||
| # Only forwarded to L2Distance | ||
| EMBEDDING_VECTOR = [0.1, 0.2, 0.3] |
There was a problem hiding this comment.
Trailing whitespace after the EMBEDDING_VECTOR literal is likely unintentional and can cause noisy diffs/lint failures. Please remove the extra spaces.
| EMBEDDING_VECTOR = [0.1, 0.2, 0.3] | |
| EMBEDDING_VECTOR = [0.1, 0.2, 0.3] |
| import logging | ||
| from statistics import median | ||
|
|
||
| # Django filter() only does ADD logic |
There was a problem hiding this comment.
The comment says “Django filter() only does ADD logic”, which looks like a typo and is also misleading: queryset filters are combined with AND semantics by default, while OR is possible via Q objects. Consider correcting/removing this comment to avoid confusing future readers.
| # Django filter() only does ADD logic | |
| # Use Q objects to express OR conditions in Django queries |
| except Exception as e: | ||
| logger.error(f"Failed to create semantic search usage database record: {e}") |
There was a problem hiding this comment.
log_usage() swallows all exceptions, but logger.error(f"... {e}") drops the traceback, making it much harder to debug production failures when SemanticSearchUsage writes fail. Prefer logger.exception(...) (or logger.error(..., exc_info=True)) so the stack trace is captured while still not interrupting the request.
| except Exception as e: | |
| logger.error(f"Failed to create semantic search usage database record: {e}") | |
| except Exception: | |
| logger.exception("Failed to create semantic search usage database record") |
| pytest | ||
| pytest-django |
There was a problem hiding this comment.
pytest and pytest-django are added without version pins. Since these packages can introduce breaking behavior across major/minor releases, consider pinning them (or constraining with compatible ranges) to keep test runs reproducible across environments and container rebuilds.
| pytest | |
| pytest-django | |
| pytest>=8.0.0,<9.0.0 | |
| pytest-django>=4.8.0,<5.0.0 |
| # Note: paraphrase-MiniLM-L6-v2 (~80MB) is downloaded from HuggingFace on first | ||
| # use and cached to ~/.cache/torch/sentence_transformers/ inside the container. | ||
| # That cache is ephemeral — every container rebuild re-downloads the model unless | ||
| # a volume is mounted at that path. | ||
| from .services.sentencetTransformer_model import TransformerModel | ||
| TransformerModel.get_instance() |
There was a problem hiding this comment.
Preloading the SentenceTransformer model during AppConfig.ready() can fail due to transient network/cache issues (e.g., HuggingFace downtime), which would now prevent Django from starting and serving any traffic. Consider wrapping TransformerModel.get_instance() in a try/except with logger.exception(...), and optionally allowing startup to continue (falling back to lazy load on first request) or making “fail fast” behavior configurable via an env var.
Title: Preload embedding model at startup for search and file upload
Description
Preload SentenceTransformer model at Django startup before traffic is routed to the application instance
Add tests for the embeddings services by pulling apart the core logic to make testing easier
Related Issue
GitHub Issue #441
Manual Tests
Automated Tests
Documentation
Updated README with instructions for running backend tests
Reviewers
@taichan03 @amahuli03
Notes